Skip to content

solve norm layer false negtive gap#16642

Open
Gasoonjia wants to merge 4 commits intogh/gasoonjia/102/basefrom
gh/gasoonjia/102/head
Open

solve norm layer false negtive gap#16642
Gasoonjia wants to merge 4 commits intogh/gasoonjia/102/basefrom
gh/gasoonjia/102/head

Conversation

@Gasoonjia
Copy link
Contributor

@Gasoonjia Gasoonjia commented Jan 15, 2026

Stack from ghstack (oldest at bottom):

When comparing AOT intermediate outputs with runtime, we believed that AOT and runtime should have same output for same operator. But if there're multiple intermediate outputs from a single operator / single operator blob, the statement may not correct. Like drop out, which only record output tensor during AOT, but in runtime we record both mask and output tensor.

To support that, for 1 to many scenerio, instead of only take the last element for comparsion, we compare the runtime output sharing the same size and dtype with the aot one to have the best comparsion.

Differential Revision: D90790256

When comparing AOT intermediate outputs with runtime, we believed that AOT and runtime should have same output for same operator. But if there're multiple intermediate outputs from a single operator / single operator blob, the statement may not correct. Like drop out, which only record output tensor during AOT, but in runtime we record both mask and output tensor.

To support that, for 1 to many scenerio, instead of only take the last element for comparsion, we compare the runtime output sharing the same size and dtype with the aot one to have the best comparsion.

Differential Revision: [D90790256](https://our.internmc.facebook.com/intern/diff/D90790256/)

[ghstack-poisoned]
@pytorch-bot
Copy link

pytorch-bot bot commented Jan 15, 2026

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/16642

Note: Links to docs will display an error until the docs builds have been completed.

❌ 7 New Failures, 3 Cancelled Jobs, 2 Unrelated Failures

As of commit ccb1daf with merge base b46f6b5 (image):

NEW FAILURES - The following jobs have failed:

CANCELLED JOBS - The following jobs were cancelled. Please retry:

FLAKY - The following jobs failed but were likely due to flakiness present on trunk:

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@meta-cla meta-cla bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Jan 15, 2026
@github-actions
Copy link

This PR needs a release notes: label

If your change should be included in the release notes (i.e. would users of this library care about this change?), please use a label starting with release notes:. This helps us keep track and include your important work in the next release notes.

To add a label, you can comment to pytorchbot, for example
@pytorchbot label "release notes: none"

For more information, see
https://github.com/pytorch/pytorch/wiki/PyTorch-AutoLabel-Bot#why-categorize-for-release-notes-and-how-does-it-work.

When comparing AOT intermediate outputs with runtime, we believed that AOT and runtime should have same output for same operator. But if there're multiple intermediate outputs from a single operator / single operator blob, the statement may not correct. Like drop out, which only record output tensor during AOT, but in runtime we record both mask and output tensor.

To support that, for 1 to many scenerio, instead of only take the last element for comparsion, we compare the runtime output sharing the same size and dtype with the aot one to have the best comparsion.

Differential Revision: [D90790256](https://our.internmc.facebook.com/intern/diff/D90790256/)

[ghstack-poisoned]
When comparing AOT intermediate outputs with runtime, we believed that AOT and runtime should have same output for same operator. But if there're multiple intermediate outputs from a single operator / single operator blob, the statement may not correct. Like drop out, which only record output tensor during AOT, but in runtime we record both mask and output tensor.

To support that, for 1 to many scenerio, instead of only take the last element for comparsion, we compare the runtime output sharing the same size and dtype with the aot one to have the best comparsion.

Differential Revision: [D90790256](https://our.internmc.facebook.com/intern/diff/D90790256/)

[ghstack-poisoned]
When comparing AOT intermediate outputs with runtime, we believed that AOT and runtime should have same output for same operator. But if there're multiple intermediate outputs from a single operator / single operator blob, the statement may not correct. Like drop out, which only record output tensor during AOT, but in runtime we record both mask and output tensor.

To support that, for 1 to many scenerio, instead of only take the last element for comparsion, we compare the runtime output sharing the same size and dtype with the aot one to have the best comparsion.

Differential Revision: [D90790256](https://our.internmc.facebook.com/intern/diff/D90790256/)

[ghstack-poisoned]
Gasoonjia added a commit that referenced this pull request Jan 16, 2026
Pull Request resolved: #16642

When comparing AOT intermediate outputs with runtime, we believed that AOT and runtime should have same output for same operator. But if there're multiple intermediate outputs from a single operator / single operator blob, the statement may not correct. Like drop out, which only record output tensor during AOT, but in runtime we record both mask and output tensor.

To support that, for 1 to many scenerio, instead of only take the last element for comparsion, we compare the runtime output sharing the same size and dtype with the aot one to have the best comparsion.
ghstack-source-id: 334110504
@exported-using-ghexport

Differential Revision: [D90790256](https://our.internmc.facebook.com/intern/diff/D90790256/)
Copy link
Member

@GregoryComer GregoryComer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't fully follow why we can't capture all of the reference outputs for each op to avoid needing to heuristically match, as both portable ops and delegate calls should have matching signatures AOT and and at runtime, but I'm assuming we have a good reason. I'll approve to unblock.

Returns:
The matching runtime output, or runtime_intermediate_output[-1] as fallback.
"""
# Find all runtime outputs that match the AOT shape
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it practical to capture all of the outputs AOT so that we don't have to do this?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We are capturing all outputs from both AOT and runtime, but their outputs do not always match each other.
Take dropout as example, AOT only generate the function output, but runtime generates two: mask and real output.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. fb-exported meta-exported

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants

Comments